[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE - REVERTED by #14599 by moraxu · Pull Request #14164 · NVIDIA/TensorRT-LLM

moraxu · 2026-05-15T01:41:51Z

Summary by CodeRabbit

New Features
- Added support for Qwen3.5 MoE vision language models with improved multimodal architecture.
- Enhanced configuration utilities for better dtype handling and model compatibility.
Tests
- Added comprehensive tests for Qwen3.5 MoE multimodal models.
- Added MMMU accuracy evaluation and reference metrics for Qwen3.5-35B-A3B.

Description

Completes Qwen3.5-MoE-VL (Qwen3_5MoeForConditionalGeneration) on top of #12611.
Switches the VLM config path to HF's native transformers.Qwen3_5MoeConfig (present in 5.3.0), adds a thin post-load normalizer that materializes the handful of aliases the reused Qwen3Next runtime expects on text_config (intermediate_size from the MoE fields, rope_theta/partial_rotary_factor/rope_scaling from rope_parameters), and centralizes hybrid-cache dtype resolution in two helpers.

Test Coverage

Accuracy & unit tests
TODO: Comparison unit tests against HF

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

moraxu · 2026-05-15T01:49:30Z

@coderabbitai full review

coderabbitai · 2026-05-15T01:49:35Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-05-15T01:55:00Z

📝 Walkthrough

Walkthrough

This PR adds Qwen3.5 MoE Vision Language Model support through dtype resolution utilities, config normalization for multimodal architectures, a new VLModel wrapper class, weight mapper registration, and comprehensive unit and integration tests.

Changes

Qwen3.5 MoE VLM Implementation

Layer / File(s)	Summary
Config utilities and dtype resolution foundation `tensorrt_llm/_torch/pyexecutor/config_utils.py`	Introduces `_coerce_torch_dtype`, `resolve_hf_torch_dtype`, and `resolve_mamba_ssm_cache_dtype` helpers to normalize HF dtype fields and Mamba cache dtypes, with `extract_mamba_kv_cache_params` and `MambaKVCacheParams` updated to use the new resolution logic.
Qwen3.5 config normalization and VLM adaptation `tensorrt_llm/_torch/pyexecutor/config_utils.py`	Adds `_normalize_qwen35_mrope_config`, `_normalize_qwen35_qwen3next_text_aliases`, `_normalize_qwen35_quantization_config`, and `_normalize_qwen35_moe_vl_config` to normalize mRoPE aliases, quantization exclude-modules, and VLM model wiring. `load_pretrained_config` now loads `qwen3_5_moe` VLM checkpoints as `Qwen3_5MoeConfig` and applies VLM normalization.
VL base class architecture and embedding support `tensorrt_llm/_torch/models/modeling_qwen3vl.py`	`Qwen3VLModelBase.__init__` adds support for `Qwen3_5MoeForConditionalGeneration` architecture mapping, and `init_mrope_embedding` improves `head_dim` derivation via `getattr` with fallback to computed value.
Qwen3.5 MoE VLModel wrapper class and wiring `tensorrt_llm/_torch/models/modeling_qwen3_5.py`, `tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`, `tensorrt_llm/_torch/models/__init__.py`	`Qwen3_5MoeVLModel` class registered for `Qwen3_5MoeForConditionalGeneration`, composing vision encoder with text decoder, defining `multimodal_data_device_paths`, and implementing custom `load_weights` with conditional vision loading, namespace remapping, and `Qwen3_5MoeHfWeightMapper` integration. Weight mapper registered for VLM checkpoint, and model exported via `__init__`.
Model loader dtype resolution integration `tensorrt_llm/_torch/pyexecutor/model_loader.py`	`validate_and_set_mamba_ssm_cache_dtype` updated to use `resolve_mamba_ssm_cache_dtype` and `resolve_hf_torch_dtype` helpers in precedence chain for dtype determination.
Qwen3Next load_weights parameter extension `tensorrt_llm/_torch/models/modeling_qwen3_next.py`	`Qwen3NextForCausalLM.load_weights` extended with optional `params_map` and `allow_partial_loading` parameters forwarded to superclass.
Module export list formatting `tensorrt_llm/_torch/configs/__init__.py`	Config module `__all__` reformatted to multi-line list.
Unit tests for Qwen3.5 VLM config and model resolution `tests/unittest/_torch/modeling/test_modeling_qwen3_5_vl_moe.py`	Helper function `_write_qwen35_moe_vl_config` and tests validate config architecture preservation, `mamba_ssm_cache_dtype` resolution, auto-model/mapper selection, and multimodal placeholder metadata registration.
Integration accuracy tests for Qwen3.5-35B-A3B VLM `tests/integration/defs/accuracy/references/mmmu.yaml`, `tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py`	MMMU accuracy reference (59.0) added; new `TestQwen3_5_35B_A3B_VL` integration test class with memory gating, sampling configuration, and MMMU evaluation at `max_batch_size=32`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

2ez4bz
yechank-nvidia
syuoni
xinhe-nv

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 32.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is incomplete. While it explains the core objective (adding Qwen3.5-MoE-VL support), it lacks critical required sections.	Add a clear title with [TRTLLM-12500][feat] prefix, expand the Description section with detailed technical explanation, and replace 'TODO' with completed test coverage documentation.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE' clearly and concisely describes the main feature addition - Qwen3.5 Vision Language MoE support - matching the changeset which adds VLM infrastructure and models.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/config_utils.py`:
- Around line 48-52: resolve_hf_torch_dtype and resolve_mamba_ssm_cache_dtype
call _coerce_torch_dtype on each candidate attribute but immediately return its
result, so a returned None from _coerce_torch_dtype (the "auto" sentinel)
prematurely stops the fallback chain; change both functions to only return the
coerced dtype when _coerce_torch_dtype(...) is not None, otherwise continue
scanning the remaining attributes (i.e., call getattr for each attr, call
_coerce_torch_dtype, and if the result is truthy/not None then return it; if
None keep looping and finally return None).

In `@tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py`:
- Around line 438-447: The new test method test_auto_dtype lacks an explicit
return type; update its signature to include "-> None" (i.e., def
test_auto_dtype(self) -> None:) to comply with repository typing rules and
mypy-friendly guidelines—locate the test_auto_dtype method that constructs
LLM(...) and calls task.evaluate on MMMU(...) and add the return annotation
there.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b09424f2-ec07-43fb-b7b5-e73404064a0a

📥 Commits

Reviewing files that changed from the base of the PR and between d75df19 and 44ca139.

📒 Files selected for processing (11)

tensorrt_llm/_torch/configs/__init__.py
tensorrt_llm/_torch/models/__init__.py
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
tensorrt_llm/_torch/models/modeling_qwen3_5.py
tensorrt_llm/_torch/models/modeling_qwen3_next.py
tensorrt_llm/_torch/models/modeling_qwen3vl.py
tensorrt_llm/_torch/pyexecutor/config_utils.py
tensorrt_llm/_torch/pyexecutor/model_loader.py
tests/integration/defs/accuracy/references/mmmu.yaml
tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
tests/unittest/_torch/modeling/test_modeling_qwen3_5_vl_moe.py

moraxu · 2026-05-18T18:21:13Z

/bot run

moraxu · 2026-05-18T18:26:52Z

/bot run

tensorrt-cicd · 2026-05-18T18:32:41Z

PR_Github #48959 [ run ] triggered by Bot. Commit: 9438b0d Link to invocation

tensorrt-cicd · 2026-05-18T19:54:56Z

PR_Github #48959 [ run ] completed with state SUCCESS. Commit: 9438b0d
/LLM/main/L0_MergeRequest_PR pipeline #38705 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

moraxu · 2026-05-18T19:56:04Z

/bot run

tensorrt-cicd · 2026-05-18T20:05:27Z

PR_Github #48976 [ run ] triggered by Bot. Commit: 9438b0d Link to invocation

tensorrt-cicd · 2026-05-18T21:56:26Z

PR_Github #48976 [ run ] completed with state SUCCESS. Commit: 9438b0d
/LLM/main/L0_MergeRequest_PR pipeline #38721 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

moraxu · 2026-05-18T21:59:21Z

/bot run

moraxu · 2026-05-18T21:59:49Z

/bot kill

tensorrt-cicd · 2026-05-18T22:04:57Z

PR_Github #48987 [ run ] triggered by Bot. Commit: 9438b0d Link to invocation

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

moraxu · 2026-05-21T03:39:51Z

/bot help

github-actions · 2026-05-21T03:39:59Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

moraxu · 2026-05-21T03:40:39Z

/bot reuse-pipeline

tensorrt-cicd · 2026-05-21T03:47:14Z

PR_Github #49571 [ reuse-pipeline ] triggered by Bot. Commit: ee6511e Link to invocation

tensorrt-cicd · 2026-05-21T03:54:12Z

PR_Github #49571 [ reuse-pipeline ] completed with state SUCCESS. Commit: ee6511e
Reusing PR_Github #49455 for commit ee6511e

Link to invocation

Tabrizian

Reviewed py_executor/* changes and LGTM.

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

…IDIA#14465)

moraxu · 2026-05-26T23:21:23Z

This PR was later reverted due to MTP issues, see the follow up PR here: #14599

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

…IDIA#14465)

github-actions Bot assigned moraxu May 15, 2026

moraxu mentioned this pull request May 15, 2026

[None][feat] Add the Qwen3.5 multimodal support. #12611

Closed

1 task

moraxu requested review from 2ez4bz and yechank-nvidia May 15, 2026 01:43

moraxu commented May 15, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py Outdated

2ez4bz reviewed May 15, 2026

View reviewed changes

2ez4bz reviewed May 16, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated

Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated

Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py

moraxu marked this pull request as ready for review May 18, 2026 18:21

moraxu requested review from a team as code owners May 18, 2026 18:21

moraxu requested a review from 2ez4bz May 18, 2026 18:21

moraxu requested a review from Tabrizian May 18, 2026 18:21

moraxu force-pushed the qwen3_5_vl_moe branch from 7ad87e1 to 9438b0d Compare May 18, 2026 18:26

nv-guomingz and others added 8 commits May 20, 2026 20:28

[None][feat] Add the Qwen3.5 multimodal support.

d917e03

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

Qwen3.5 VL MoE Working draft

fef5361

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address review comments

1e9a866

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address review comments

05c59b2

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Add tests

8cf91a6

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Formatting

6b72d9d

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address CodeRabbit review

8da13df

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address review comments

d5e221a

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

moraxu force-pushed the qwen3_5_vl_moe branch from dbf9767 to d5e221a Compare May 21, 2026 03:36

moraxu requested a review from a team as a code owner May 21, 2026 03:36

moraxu requested review from liji-nv and syuoni May 21, 2026 03:36

Restore tensorrt_llm/_torch/configs/__init__.py from main

ee6511e

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Tabrizian approved these changes May 21, 2026

View reviewed changes

moraxu merged commit 96a4a09 into NVIDIA:main May 21, 2026
7 checks passed

brnguyen2 mentioned this pull request May 22, 2026

[TRTLLM-12154][test] Add Qwen3-32B FP8 disagg stress test #14278

Merged

5 tasks

nv-guomingz mentioned this pull request May 22, 2026

[None][feat] Revert Add support for Qwen3.5 VL MoE (#14164) #14465

Merged

1 task

nv-guomingz added a commit that referenced this pull request May 23, 2026

[None][feat] Revert Add support for Qwen3.5 VL MoE (#14164) (#14465)

751be5d

KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 26, 2026

[None][feat] Revert Add support for Qwen3.5 VL MoE (NVIDIA#14164) (NV…

75f79f9

…IDIA#14465)

moraxu mentioned this pull request May 26, 2026

[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE (with the MTP fixes) #14599

Open

1 task

bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 28, 2026

[None][feat] Revert Add support for Qwen3.5 VL MoE (NVIDIA#14164) (NV…

9ae3cbf

…IDIA#14465)

moraxu changed the title ~~[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE~~ [TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE - REVERTED by #14599 May 30, 2026

Uh oh!

Conversation

moraxu commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Uh oh!

moraxu commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moraxu commented May 18, 2026

Uh oh!

moraxu commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

moraxu commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

moraxu commented May 18, 2026

Uh oh!

moraxu commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

moraxu commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

moraxu commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

Tabrizian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

moraxu commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

moraxu commented May 15, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading